Krzych et al. Trials 201 1, 12:212 
http://www.trialsjournal.eom/content/1 2/1/212 



TRIALS 



RESEARCH Open Access 



Assessment of data quality in an international 
multi-centre randomised trial of coronary artery 
surgery 

Lukasz J Krzych 1,2 " Belinda Lees 1 , Fiona Nugara 1 , Winston Banya 1 , Andrzej Bochenek 2 , Jo Cook 3 , David Taggart 3 
and Marcus D Flather 1 



Abstract 

Background: ART is a multi-centre randomised trial of cardiac surgery which provided a unique opportunity to 
evaluate the data from a large number of centres from a variety of countries. We attempted to assess data quality, 
including recruitment rates, timeliness and completeness of the data obtained from the centres in different socio- 
economic strata. 

Methods: The analysis was based on the 2-page CRF completed at the 6 week follow-up. CRF pages were 
categorised into "clean" (no edit query) and "dirty" (any incomplete, inconsistent or illegible data). The timelines 
were assessed on the basis of the time interval from the visit and receipt of complete CRF. Data quality was 
defined as the number of data queries (in percent) and time delay (in days) between visit and receipt of correct 
data. Analyses were stratified according to the World Bank definitions into: "Developing" countries (Poland, Brazil 
and India) and "Developed" (Italy, UK, Austria and Australia). 

Results: There were 18 centres in the "Developed" and 10 centres in the "Developing" countries. The rate of 
enrolment did not differ significantly by economic level ("Developing":4.1 persons/month, "Developed":3.7 persons/ 
month). The time interval for the receipt of data was longer for "Developing" countries (median:37 days) compared 
to "Developed" ones (median:11 days) (p < 0.001). The median number of data queries was 23% in "Developed" 
countries compared to 19% in "Developing" ones (p = ns). 

Conclusions: In this study we showed that data quality was comparable between centres from "Developed" and 
"Developing" countries. Data was received in a less timely fashion from Developing countries and appropriate 
systems should be instigated to minimize any delays. Close attention should be paid to the training of centres and 
to the central management of data quality. 

Trial registration: ISRCTN46552265 



Background 

International multi-centre randomised trials are widely 
used to evaluate new investigational medicinal products 
or treatment strategies. It is essential that only accurate 
and verified data are collected in these trials in order 
that the results are reliable particularly as this may be 
used to inform guidance and recommendations for 
everyday clinical practise. However the collection of 



* Correspondence: l.krzych@wp.pl 

^Clinical Trials and Evaluation Unit, Royal Brompton and Harefield NHS 
Foundation Trust, London, UK 

Full list of author information is available at the end of the article 



high quality data in these trials can be challenging 
because of several potential difficulties e.g. the inclusion 
of multiple centres with different research experience, 
different cultures and healthcare systems, language diffi- 
culties, and the sheer number of people involved in col- 
lecting and sending data. 

Quality assurance is the key point in all steps of data 
management, beginning with data generation and enter- 
ing data on to case report forms (CRF) by centres, and 
ending with statistical analysis and presentation of the 
results [1]. Data quality can be variable and the purpose 
of quality assurance is not only to ensure that all data 
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are correct but also to ensure that any observed treat- 
ment effects are authentic and their estimated magni- 
tude is unbiased so that clinical trial results are reliable. 
Inappropriate CRF or questionnaire management may 
produce bias and a lack of precision in the estimates of 
treatment effects [2], Therefore quality assurance is a 
cornerstone in improving data quality [3-5]. 

Even when systematically controlled, databases in clin- 
ical trials may include errors. For example in a multi- 
centre clinical trial comparing methods of treatment of 
uterine cervical cancer a data accuracy of 81.8% was 
found and both problems in data management but also 
a lack of clarity of the CRF were to blame [6]. Nahm et 
al. revealed that the average error rate for published 
CRF-to-database comparison audits was on average 14.3 
per 10, 000 fields [7]. The issue was also described in 
cardiac surgery studies [8,9]. However, often these errors 
have been described in registries rather than from ran- 
domised clinical trials. 

The Arterial Revascularisation Trial (ART) is an inter- 
national multi-centre randomised clinical trial designed 
to compare single internal mammary artery (IMA) with 
bilateral IMA grafting in patients undergoing coronary 
artery by-pass graft (CABG) surgery [10]. Since ART is 
one of the largest cardiac surgery trials ever to be 
undertaken it provides a unique opportunity to evaluate 
the data from a large number of centres from a variety 
of countries with different socio-economic status and to 
perform a systematic analysis of the quality of the data 
from the different centres. 

Our main aim was to compare the data quality 
obtained from the centres in different socio-economic 
strata. We wanted to compare the following: 

1. Recruitment rates across different sites and relate 
this to socio-economic status 

2. Time differences for receipt of data at 6 weeks fol- 
low-up 

3. Completeness of the data assessed by the number of 
data queries 

Our hypothesis was that neither recruitment rates nor 
data quality and time delay in sending the data are 
dependent on the socio-economic status of the country 
of the participating site. 

Methods 

ART is a multi-centre two-arm randomised trial 
designed to determine if the use of both mammary 
arteries during CABG surgery improves survival, and 
reduces the chance of recurrent angina and/or the need 
for further intervention (including further cardiac sur- 
gery or percutaneous coronary intervention) compared 
to using one mammary artery. CABG patients with 
multi-vessel coronary artery disease were considered for 
inclusion into the study. The exclusion criteria were as 



follows: single graft, redo-CABG, evolving myocardial 
infarction and concomitant valve surgery. After giving 
written informed consent patients were randomised into 
the trial. Patients were followed up at 6 weeks post sur- 
gery and then annually for up to 10 years. The main 
outcome is survival but patients are also being followed 
up for myocardial infarctions, angina symptoms, strokes 
or any other clinical adverse events [10]. 

ART is supported by grants from the Medical 
Research Council (MRC) and the British Heart Founda- 
tion (BHF). In the original funding application to the 
MRC and BHF, centres from the UK, Italy and Australia 
were identified as potential centres. However, once the 
study was underway, other centres from Austria, Poland, 
Brazil and India also expressed an interest in 
participating. 

All centres in ART received a training visit from a 
member of the co-ordinating centre (CTEU, Royal 
Brompton Hospital, London, UK) where the require- 
ments for data collection, completion of the CRFs and 
management of the data were described in a standar- 
dised format. These visits ensured that the investigators 
at each site (including principal investigator, co-investi- 
gators and co-ordinators) fully understood the Protocol 
and the practical procedures for the study described in 
the Manual of Operations and the importance of con- 
ducting the study to Good Clinical Practice (GCP). 
Study site co-ordinators were responsible for gathering 
and recording data, and handling and resolving any edit 
queries. 

Data collection in ART is based on a paper system 
with central monitoring of the data. A two-part no-car- 
bon required (NCR) CRF was created to collect baseline, 
in-hospital surgical information and follow-up data. The 
participating centres were required to complete the rele- 
vant CRF pages, tear off the top copy and then send 
these pages to the CTEU by post or fax within the obli- 
gatory timelines (Table 1). On receipt of these data, the 
CTEU would review and log all data into the database 
in the first instance. Data would then be entered into a 
bespoke database system. If any inconsistent, missing, or 



Table 1 Trial phases and time frame for receipt of study 
documentation 

Trial Documents Time Frame 

Screening data Mailed to CTEU within 17 days of randomisation 

In-hospital data Mailed to CTEU within 17 days of the surgery 

6 week follow-up Mailed to CTEU within 17 days of follow up visit 

Annual follow- Mailed to CTEU within 17 days of follow up visit 
ups 

Edit queries Faxed to CTEU within 3 weeks of receipt 

Adverse event Faxed to CTEU within 72 hours of knowledge of the 
form event 
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illegible data were found, a data query would be raised. 
Each data query would request clarification of either 
one or more data points. Each query would be sent by 
fax to the centre for resolution. The participating cen- 
tres were given a deadline of 3 weeks to return the cor- 
rected data by fax to CTEU. In the event of not 
receiving this information, centres would be sent a 
reminder to send these data. On receipt of the corrected 
information, the CTEU would then update the database 
with the appropriate information and then the query 
would be closed. 

As described above, our main hypothesis was that 
neither recruitment rates nor data quality and time 
delay in sending the data are dependent on the socio- 
economic status of the country of the participating site. 
In this observational study, to test the hypothesis we 
performed an analysis based on the 2 page CRF that 
should be completed at the 6 week follow-up. Overdue 
6 week data would be chased at 60 days post randomi- 
sation (42 days + 17 days for completion and postage of 
the CRF pages to the CTEU. The data-points from the 6 
week follow-up CRF pages formed the basis for the 
assessment of the data query generation and are shown 
in Additional file 1. CRF pages were reviewed and cate- 
gorised into "clean" and "dirty". A clean CRF was classi- 
fied as one with no edit queries on first receipt. Each 
CRF page was classified separately. Each variable from 
the two pages was categorised into either "no edit query 
raised", or "edit query raised". If any data were incom- 
plete, inconsistent or illegible, CTEU raised a data query 
requesting the centre to clarify the data. The timelines 
were assessed on the basis of the time interval from the 
6 week follow-up visit and receipt of complete (verified) 
CRF at CTEU (Table 2). 

Table 2 Timelines established to assess time gap 
between 6-week visit and receipt of data 



'CLEAN' DATA 



'DIRTY' DATA 



Randomisation 
I 

Surgery 
I 

6 week follow-up visit 
I 

Data received 
I 

Data entered into a 
database 



TIME 
GAP 



TIME 
GAP 



Randomisation 
I 

Surgery 
I 

6 week follow-up visit 
I 

Data received 
I 

Data sent back for 
correction 

I 

Corrected data received 
I 

Data entered into a 
database 



The number of data queries raised per patient (counting 
a maximum of one data query per CRF variable) was 
counted. The percentage of data queries per patient was 
then calculated based on the number of 42 possible 
queries to be generated in total (see Additional file 1). 
CRFs for all patients were analysed and presented in the 
results. The number of recruited participants was estab- 
lished on the basis of the date of first patient enrolled as 
the reference date. Only whole months of enrolment in 
the analysis were included. Rate of recruitment were 
expressed as the number of patients enrolled per month. 

Statistical analysis 

Our primary goal was a comparison of recruitment rates 
and data quality between countries. Data quality was 
defined as the number of data queries (in percent) and 
time delay (in days) between 6 week follow-up visit and 
receipt of correct data. Analyses were stratified on the 
basis of the socio-economic level into two categories: 
"Developing" countries (Poland, Brazil and India) and 
"Developed" (Italy, UK, Austria and Australia), according 
to the World Bank data [11]. We also assessed the 
impact of enrolment on the number of data queries and 
time elapsed between 6-week visit and receipt of data. 

Variables are shown as arithmetic mean and standard 
deviation (for normally distributed quantitative vari- 
ables) or median (Me) and interquartile range (IQR) (for 
non-normally distributed quantitative data), or percent 
(for qualitative data). Correlation between quantitative 
variables was determined on the basis on Spearman 
rank coefficients. Between-group comparisons were per- 
formed using Mann- Whitney U-test. Normality of distri- 
bution for continuous data was verified by Shapiro- Wilk 
W-test. Non-normally distributed data underwent loga- 
rithmic transformation before further analyses. 'P' value 
< 0.05 was considered statistically significant. 

Results 

In the ART trial 3102 patients were randomised within 
28 centres in 7 countries over 42 months. There were 
18 centres (with 2326 randomised patients) in the 
"Developed" and 10 centres (with 676 randomised 
patients) in the "Developing" countries. 

The total recruitment period was 42 months. Only 6 
centres recruited patients for 3 years or more. The med- 
ian number of months for recruitment was 28 per cen- 
tre (minimum 3, maximum 42) (32 for 'Developed' and 
23 for 'Developing'; p < 0.001). The median recruitment 
by centre was 94 patients (minimum 6, maximum 427) 
with no significant difference between "Developed" and 
"Developing" countries (96 patients and 78 patients, 
respectively). 

The overall recruitment rate was 4.4 patients per 
month per centre (minimum 1.8, maximum 12.1). There 
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was a correlation between rate of enrolment and num- 
ber of recruited patients by centres in the participating 
countries (R 2 = 0.53, p < 0.001) (Figure 1). 

The median time interval from 6 week follow-up visit 
and receipt of complete CRF was 14 days (IQR: 7, 34) 
and the median percent of data queries was 21% (IQR: 
5, 48). 

We found no correlation between the median time 
elapsed between 6-week visit and receipt of data, and 
the number of recruited patients by country (R 2 = 0.003, 
p = ns). There was also no correlation between the 
median percent of data queries per country and the 
number of recruited patients (R 2 = 0.04, p = ns). Finally, 
there was no correlation between the median percent of 
data queries and the median time elapsed between 6- 
week visit and receipt of data by country (R 2 = 0.02, p = 
ns). 

The number of recruited patients did not differ statis- 
tically significantly by economic level ("Developing" 
countries median: 83 persons/country, "Developed" 
countries median: 98 persons/country) (Figure 2) as well 
as the rate of enrolment ("Developing" countries med- 
ian: 4.1 persons/month, "Developed" countries median: 
3.7 persons/month) (Figure 3). 

The time elapsed between 6-week visit and receipt of 
data per country by economic level in shown in Figure 
4. Time interval was significantly longer for "Develop- 
ing" countries (median: 37 days) compared to "Devel- 
oped" ones (median: 11 days) (p < 0.001). 

The percent of data queries in a 6-week follow-up 
visit per country was higher in "Developed" countries 
(median: 23%) compared to "Developing" ones (med- 
ian: 19%) but the difference was not significant (Figure 
5) (p = ns). 
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Figure 1 Scatter diagram. Scatter diagram for correlation between 
the rate of enrolment and the number of recruited patients by 
country. 
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Figure 2 Impact of economic level. Impact of economic level on 
the number of recruited patients per country. 



Discussion 

The socioeconomic status of the country did not appear 
to influence the numbers of patients recruited or the 
rate of recruitment. The timeliness of the data was 
slower from "Developing" countries rather than "Devel- 
oped" and did not seem to affect the number of edit 
queries and the number of patients enrolled does not 
seem to affect the number of edit queries or the timeli- 
ness of the data. Those centres with the highest rate of 
enrolment were those who enrolled the most number of 
patients. 

The data from this study provide some reassurance to 
those designing and managing multi centre trials that 
using a wide variety of centres with different socioeco- 
nomic status does not appear to adversely affect the 
quality of data as assessed by the number of data 
queries. The inclusion of multiple centres worldwide 
provides a number of advantages, in particular ensuring 
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Figure 3 Impact of economic level on the rate of enrolment. 

Impact of economic level on the rate of enrolment (per month/ 
centre) per country. 
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Figure 4 Impact of economic level on time elapsed. Impact of 
economic level on time elapsed (in days) between 6-week visit and 
receipt of data per country. 



study recruitment is completed on time and also allow- 
ing the findings of the study to be applicable to future 
patients worldwide. However, there are a number of 
potential challenges to consider when including centres 
worldwide. These are cultural differences, and the varia- 
bility of resources to carry out both the surgery and 
completion of study related tasks including completing 
the CRFs and following patients up properly. In ART 
we also have to consider communication issues both 
between the co-ordinating centre and the participating 
centres and also between the centres and the patients 
including reliability of postal systems and access to tele- 
phones for follow-up. Additionally centres need to fol- 
low up patients if they are admitted to other hospitals 
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Figure 5 Impact of economic level on the percent of data 
queries. Impact of economic level on the percent of data queries 
in a 6-week follow-up visit per country, values are Me, IQR (box), 
range (whiskers), and extreme values (dots). 



and the systems to do this and obtain the necessary 
medical summaries are variable. 

We searched Medline for other clinical trials that had 
evaluated data quality and found three trials although 
none were exactly related to our analyses. One of these 
trials was an oncologic international trial conducted in 
the Netherlands and Indonesia. The authors showed 
that using an electronic medical records system helped 
to reduce data error rates, especially those critical for 
the primary goals of the trial [12]. They also found that 
during the study period the quality of data improved. 
Out of 433 CRFs submitted for the first time 33.7% 
needed some corrections but none of them had more 
than 2 errors in the primary data. Five months after the 
start of study the error rate for the primary data items 
was just 1.6%. It needs to be clarified that the analysis 
included only 2 countries so generalisation of its find- 
ings is limited. 

In the second study, Tolmie and colleagues assessed 
the data quality submitted to the Clinical Endpoint 
Committee for adjudication [13]. They assessed the 
information submitted in the packages to the Commit- 
tee for the endpoint events from 25 countries. Data 
quality was rather poor. They found that 782 queries 
were generated in 1595 endpoint packages reviewed 
amongst which 78.9% generated only one query. Inter- 
estingly, no source data queries were generated for 
countries with no more than 25 recruited subjects, but 
both low recruiting and high recruiting countries had a 
high number of queries relating to subject identifiers. 
The time between the query being submitted to the 
sponsor and being resolved ranged from one day to 22.8 
weeks (Median 23, IQR 1.61) [13]. 

In the third study, the Type 1 Diabetes Genetics Con- 
sortium Trial, the authors reported good data quality 
with a low percentage of missing data and low duplicate 
data entry error rate (up to 0.5%) [14]. Using an electro- 
nic data entry system they found some differences in 
data collection between 214 participating centres. The 
highest rate of errors was found for Asia-Pacific coun- 
tries and the United Kingdom, and the lowest was in 
European and North American centres. 

To address the potential challenges of involving multi- 
ple centres worldwide Aitken et al. suggested multidi- 
mensional strategies are used to administer such a trial. 
They found that the approaches include using experi- 
enced project coordinators, increasing communication 
between centres, implementation of strategies to opti- 
mise intervention compliance, site-specific recruitment 
and retention techniques, centralisation of data manage- 
ment and consideration of ethical and budgetary 
requirements at local sites [15]. Frank et al. recom- 
mended that to ensure high recruitment goals and high 
quality of study it is necessary to have bilingual 
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investigators and staff members who spent time at one 
another's sites, make use of frequent conference-call 
staff meetings and be flexible within the bounds of the 
sometimes contradictory requirements of the local 
authorities [16]. At a site level a collaborative relation- 
ship between researcher and practice staff is an impor- 
tant issue for recruitment and retention, and for data 
quality [17]. The factors responsible for good perfor- 
mance of the trial are also study leadership and experi- 
ence of clinical centre staff. So establishing an 
organizational structure that provide leadership, site-to- 
site communication, understandable performance cri- 
teria, a proper process for data monitoring and provid- 
ing feedback may guarantee success of the clinical trial 
[18]. 

There are some limitations of our current study. First, 
we use a paper-based system to complete data and to 
validate their quality. Medical record abstraction is the 
most significant source of errors and should be mea- 
sured and managed appropriately and in a timely fash- 
ion during the course of the trial. Researchers and co- 
ordinating centres are transitioning from paper systems 
to electronic data capture which are successfully inte- 
grated into clinical practise and are believed to be of 
higher quality compared with paper based systems [19]. 
Moreover there are many attempts to quantify data 
quality for clinical trials using electronic data collection 
that give reproducible quality control and make trials 
more valid and scientifically stringent [5,7,15,20]. At the 
co-ordinating centre for ART, the CTEU performed 
central monitoring of data to ensure consistency and 
completeness of dataset. Second, in our analysis we 
grouped centres by country and then by socioeconomic 
status without focusing on data on individual centres 
which may have masked wide variation in data quality 
within countries. However this method was used inten- 
tionally to guarantee anonymity between centres. Finally, 
this paper focuses only on 6 week data which is close to 
the date of surgery. On the one hand, it is convenient to 
eliminate difficulties with patient follow up (i.e. bias 
caused by lost-to-follow-up). But on the other hand we 
cannot exclude that annual follow up may provide dif- 
ferent results, and this should be investigated in further 
studies. 

Conclusions 

This study provides evidence that in a large multi-centre 
trial, rates of recruitment, total recruitment and data 
quality can be comparable between centres from "devel- 
oped" and "developing countries". Close attention 
should be paid to the training of centres and to the cen- 
tral management of data quality. Data may be received 
in a less timely fashion from developing countries and 
appropriate systems should be instigated to minimize 



any delays. Achieving accurate and timely data is an 
essential step in the good conduct of a clinical trial. 

Additional material 



Additional file 1: Data points from the 6 week visit CRF to be used 
in the edit query analysis 
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